Dataset for identification of queerphobia
نویسندگان
چکیده
While social media platforms have implemented many algorithmic approaches to moderating hate speech, there is a lack of datasets on queerphobia which has impeded efforts automatically recognize and moderate queerphobic speech online. Queerphobic that intended degrade, insult, or incite violence prejudicial action against queer people, who are those from sexuality, gender, romantic minority. This results in worsened mental emotional outcomes for people can contribute anti-queer violence. The goal this study create dataset YouTube comments further identify speech. To construct dataset, 10,000 were sourced videos represent queerness. Then, volunteers manually annotated each comment accordance with specific guidelines. Various natural language processing (NLP) models used extract features the text, several classifiers these categorize as non-queerphobic. These NLP illustrate baseline performance data. In making we hope research recognition digital make safer people. be found at https://github.com/ShivumB/dataset-for-identification-of-queerphobia.
منابع مشابه
Cross Dataset Person Re-identification
Until now, most existing researches on person re-identification aim at improving the recognition rate on single dataset setting. The training data and testing data of these methods are form the same source. Although they have obtained high recognition rate in experiments, they usually perform poorly in practical applications. In this paper, we focus on the cross dataset person re-identification...
متن کاملSpeaker Identification with VoxCeleb DataSet
In this project, we perform a text independent speaker identification experiment with a newly released data set, VoxCeleb (2017)[1], which consists of celebrity interview audio clips downloaded from Youtube. It’s a challenging data set in the sense that there are often multiple vocal sources in the same clip. A MFCC feature vector based Deep Neural Network (DNN) is used as our baseline. It is c...
متن کاملArabian Horse Identification Benchmark Dataset
The lack of a standard muzzle print database is a challenge for conducting researches in Arabian horse identification systems. Therefore, collecting a muzzle print images database is a crucial decision. The dataset presented in this paper is an option for the studies that need a dataset for testing and comparing the algorithms under development for Arabian horse identification. Our collected da...
متن کاملVoxCeleb: A Large-Scale Speaker Identification Dataset
Most existing datasets for speaker identification contain samples obtained under quite constrained conditions, and are usually hand-annotated, hence limited in size. The goal of this paper is to generate a large scale text-independent speaker identification dataset collected ‘in the wild’. We make two contributions. First, we propose a fully automated pipeline based on computer vision technique...
متن کاملVISION: a video and image dataset for source identification
Forensic research community keeps proposing new techniques to analyze digital images and videos. However, the performance of proposed tools are usually tested on data that are far from reality in terms of resolution, source device, and processing history. Remarkably, in the latest years, portable devices became the preferred means to capture images and videos, and contents are commonly shared t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Student Research
سال: 2023
ISSN: ['2167-1907']
DOI: https://doi.org/10.47611/jsrhs.v12i1.4405